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1 . INTRODUCTION 

A criterion used for assessing the performance of machine-processing algorithms 
during the Large Area Crop Inventory Experiment (LACIE) was the variance reduc- 
tion factor. With the current processing procedure, Procedure 1, the observed 
variance reduction factor is high — around 0.75. This high value may be 
attributed to the low recognition accuracy of the current classification 
methods and to the need for estimating a large number of parameters. In addi- 
tion, current procedures do not use scene spatial information in the clas- 
sifiers; they use a separate set of labeled patterns (type 2 dots) for bias 
correction. 

This plan proposes to evaluate classifiers that have significantly fewer 
parameters to estimate and use type 1 dots more effectively for estimating 
the biases. In particular, it is proposed to evaluate linear and piecewise 
linear classifiers and use leave-one-out methods for estimating the biases 
directly from type 1 dots. 

2. SAMPLE SIZE AND DIMENSIONALITY 

The error rates and hence the variance reduction factor estimates are a func- 
tion of a particular sample used in the estimation. The expected values of 
the errors for a linear classifier can be derived as a function of training 
sample size and dimensionality (refs. 1 and 2). It has been observed that 
the minimum required ratio of number of samples to dimensionality per class 
is 3. Hence, this ratio will be used in these evaluation experiments. For 
.16-dimensional data then, the required number of samples per class is 48, 

3. EVALUATION PROCEDURE 

It is proposed to use 25 segments in the evaluation. The rationale for this 
number is given in reference 3, and the details of the particular segments 
selected are given in table 1. The variance reduction factor R will be used 
as a criterion for evaluation. The factor R is defined as 


'S. .✓ 


TABLE 1.- SEGMENTS TO BE USED IN THE EVALUATION OF CLASSIFIERS 


Segment 

Location 
(county, state) 

Type of 
v/heat 

Proportion 
small grai 

1005 

Cheyenne, Colo. 

Winter 

0.347 

1032 

Wichita, Kanas. 

Winter 

.386 

1033 

Clark, Kans. 

Winter 

.095 

1853 

Ness, Kans. 

Winter 

.303 

1061 

Kearny, Kans. 

Winter 

.253 

151,2 

Clay, Minn. 

Spring 

.337 

1520 

Big Stone, Minn. 

Spring 

.308 

1544 

Sheridan, Mont. 

Spring 

.383 

1739 

Teton, Mont. 

Mixed 

.244 

1582 

Hays, Neb. 

Winter 

.194 

1604 

Renville, N. Dak. 

Spring 

.524 

1606 

Ward, N. Dak. 

Spring 

.329 

1648 

Bov-man, N. Dak. 

Spring 

.379 

1661 

McIntosh, N, Dak. . 

Spring 

.410 

1902 

McKenzie, M. Dak. 

Spring 

.086 

1231 

Jackson, Okla. 

Winter 

.741 

1242 

Canadian, Okla. 

Winter 

,472 

1367 

Major, Okla. 

Winter 

.540 

1677 

Spink, S. Dak. 

Spring 

.341 

1690 

Kingsbury, S. Dak. 

Spring 

.213 

1803 

Shannon, S. Dak. 

Winter 

.011 

1805 

Gregory, S. Dak. 

Mixed 

.158 

1056 

Moore, Tex. 

Wi nter 

.226 

1059 

Ochiltree, Tex. 

Winter 

.445 

1060 

Sherman, Tex. 

Winter 

. .231 
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■ p(i - p) 

where 

M is the total number of classes, 

N is the total number of picture elements (pixels), 
is the total number of pixels in class i, 

P is the overall proportion of small grains, and 
P^. is the proportion of small grains in class i. 

The proportion estimation procedures proposed for implementation with and 
without context are as follows. The estimated probability of occurrence of 
wheat is 


P(W) = E P(W|i)P(i) 
i=l 


where • * ■ 

A* 

P(W) is the estimated probability of occurrence of wheat; 

P(i) is the classifier estimated probability of occurrence of class i; and 

P(Wti), given the classifier decision as class i, is the probability of 
occurrence of wheat. 


The P(i)’s are estimated in the usual way. For estimation P(W|i), the leave- 
one-out method is used for the selected classifiers with and without context. 
The P(Wli) is estimated from type 1 dots. Suppose that there are patterns 
of small grains and Nq patterns of other in type 1 dots, where N = Ngg + Mg, 
the total number of type 1 dots. Using the leave-one-out method, suppose 
the following situation: 

^SGjSG “ ^SG SG ^SG analyst as small grains and counted as 

small grains by the leave-one-out method; 
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0 *" 0 ^*SG analyst as small grains and counted as 

other by the leave-one-out method; 

^’o S6 “ ^*0 SG analyst as other and counted as small 

grains by the leave-one-out method; and 

Nq 0 “ *^0 0®^ ^*0 analyst as other and counted as other by 

the leavG-one-out method; 


where 


%G "" ^SG,SG ■*' •'‘SG.O’ ^'^0 ^0,SG ■*■ ^0»0 


Then P(W|i) is estimated as follows: 


N. 


“ cri¥| 


‘SG.SG " '^O.SG 


and 


N 


S6.0 




SG,0 ”0,0 


The variance reduction factor R will be computed for all the segments and for 
the selected classifiers with and without context. Based on R, the classifi- 
cation procedures will be compared with the current Procedure 1 and with the 
other cluster-based Procedure 1 being tested (ref. 3). Biases calculated 
from type 1 dots will be compared with those obtained f'^om type 2 dots for 
all the segments. 


4. USE OF THE LEAVE-ONE-OUT METHOD FOR ESTIMATING THE 
BIASES DIRECTLY FROM TYPE 1 DOTS 


This section justifies hhe use of the leave-one-out method for estimating the 
biases directly from type 1 dots. Let 0-j be a set of parameters of the dis- 
tributions used to design the classifier, and let Gg be a set of parameters 
of the distributions of the patterns used to test the performance of the 
classifier. 
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Let £(©^^02) be the resulting error when a classifier is designed on a set of 
patterns from distributions with parameters ©-j and tested on a set of patterns 
from distributions with parameters © 2 . Let Q and 0 be the set of true param- 
eters and its estimate. The 0 is a random vector and depends on particular 
sample used in estimating it. Let ©,< be a particular value of &. Then, 

e(0,0) < e(§f|,0) 

Taking expectations on both sides, one obtains 

e(0,0) < E[e{§f^,0)] 

One of the ways of estimating the quantity on the RHS is using the 1eave~one- 
out method. This method is as follows. If there is a total of M patterns, 
leave out one pattern, design the classifier yn remaining (N - 1) patterns, 
and test on the pattern that is left out. Repeat this procedure N times, 
each time leaving a different pattern. The estimated errors will be the 
estimates of the omission and commission errors and will be used In this 
evaluation procedure for bias correction, 

5. LINEAR CLASSIFIERS 

The linear classifiers selected for evaluation are the (a) Fisher classifier, 
which is parametric (refs. 4 and 5); (b) error correction classifier, which 
is nonparametric (refs. 6 and 7); and (c) classifier with uncertain labels, 
which is parametric. This section briefly describes these classifiers and 
presents computationally efficient methods for the use of the leave-one-out 
method with these classifiers. 

5.1 FISHER CLASSIFIER 

Suppose that there are two classes. The training patterns from classes 1 
and 2 are 

yl yl ‘/I. 

X-|, X2, Xp Xg* •••» 
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The means and covariance matrices of the patterns in the classes are estimated 
as 


A 



- — T 

^i k 


y ' ? 

Aj , 




i « 1 , 2 


The Fisher weight vectors are given by 



fa-T- 


"v.” 




1 

w, - 

(m^ + nig) 

L , 5 


;i^ 


where + i 2 - The Fisher's decision rule is as follows: 

decide if V^X + v^ > vjx + 

decide Xewg otherwise 


For use with the leave-one-out method, recursive expressions for computing the 
weight vectors are given below. Let a pattern xj^ from class be left out. 
Define the means and covariance matrices of the total pattern set as follov/s: 


m. 


^'i j=l 


2 



Eo = 


(N; 


- 1 ) ” '^ 2 ^''' 


Let S^.| = + Eg. Compute V^, v^. , 


i = 1 , 2 as 


V] = 





2-1- 

'>12-1 / - - X 

+ mg) 

_ ^ 


Mote that E^ is defined differently from the usual unbiased estimate for 
covariance matrices. It is defined thus for mathematical simplicity and will 
not affect the results. The weight vectors when a pattern x], from class 
was left out become 

" Vlk’^lk 


.. /vK _ “'^lk^Wlk("’lk *^2^ 

V-iUj,) - - g— 

V2{xJ.) = sjj,mg 
/Jn _ “’^Pwik^"’lk ^ "’2^ 

VgiXj^j - 2 ■ 

where 

"’ik " - 1} ^ 

?^k 

^1 

^1k/ (N-,^- 2) " "^Ik^^^j " "^Ik^”^ 

rk 

■ A A 

^Wlk " ^Ik ■*■ ^2 

The nig and Eg are defined as before. With these, (x|.) and v^(xjj.) can be 
exprcjssed in terms of and v^ as follows. 


Let 


a = 


- 1 )(N, - Z) 
e(xj) = (X|^ - iSi (xj - m,) 
v(x’) = 1 - ^e(xl) 

h ° ^w' ■ ®i ' 

"^1 ~ " ^*1 ^ " ^^k ** 


m 


(m^ + nig) 


2 • ^2 


Thru. 


ViCxJ) = V, + 0 -X-Y ! ^Y, 

v(xj) 1 (N, -l)v(xj,) ' 

(X^ ) 

= V-i - i 7 m ^ 1 — rfp 5 / M ^ ■fT d-i 

1 K 1 v(xj) (Ni “ 1) 2 (M . i)y(xi) ^ ^nvni 1 


1 

_ g(xj[) 

2 {N^ - 1 )v(x[) 2 (N^ - 1 )^ v(X^) 


ctaCX') 


d, - 


Let 


Then 


ds = (xj, - nii)^V 2 


VgCxi) = V. + —M- 

^ ^ V x> 


S2S“W= 




1 

1 1 ad,d, , «do6(X,j 

tf /v’ N — If X 1. ^ .31, 1 3 k 

- ''2 siirpTry «3 - ) ?nv^ 

5.2 ERROR CORRECTION CLASSIFIER 

Another classifier selected for evaluation is a candidate from a nonparametric 
family of classifiers. It is an error correction classifier that uses linear 
discriminant functions for each class. 

g.(X) = wlx. i = 1, 2, M 

The decision rule is to decide a pattern XdUj, if 

g.(X) > g.(X). j = 1, 2, •••, M 
1 1 


A brief description of the algorithm is as follows. Suppose that training 
patterns x| , Xg , • • • , XjJ| ; i •= 1 , 2, • • • , M are g iven. From the training 

patterns of class , form a matrix A(i), 


Let 


A{i) 



. i = 1, 2, M 



A(l) 

A(2) 



W = [W-j. VI2, 




I 
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For a particular initial matrix B(0}, form 

AW{0) = B(0) 

and obtain initial weight vector matrix W{0), 

W(0) = (a’’’a)”V^b(o) 

Adjust B()'s and W()'s simultaneously until the correction process stops or 
for a predetermined number of iterations and take the weight vector matrix at 
that point as the solution v/eight vector matrix. 

The implementation of the leave-one-out method with this classifier is as 
follov/s. Compute the B-niatrix with the solution weight vector matrix at the 
point of stopping. That is, 


B = AW 


Let a pattern X from class be left out. Let the corresponding row of 
B-matrix be C. That is, 

- 

A = 

. X 

where c”*” = [C^, C^, •••, C^], Let 

w = (a'^a)"^a^b 

and 

= (4-rVl^ ^^N-l^N-1 

The relationship between the matrices W and can be derived as 




u + (a''^a)~^x(x'''w - c^) 
" (1 - e) 


T T 1 

where 0 = X (A A)" X. Every time a pattern is left, a new weight vector 
matrix is recursively computed and the left-out pattern is tested, and the 
biases can be computed. 
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5.3 CLASSIFIER WITH LABELING UNCERTAINTIES 


The training patterns and classes are described in section 5.2. Let a] be an 

"i 

H-dimensional vector associated with the patiern Xj; its \th component could be 
the probability that the label of the pattern x] is the Hh class. 

In the absence of this knowledge> if the training pattern comes iiuni oiass i, 

the iili component of a] could be set to 1 and the rest of the components to 
i “ 

zero. The aj can also be set so as to map the patterns of each class into 
vertices of a simplex. 


It is proposed to use the weight vectors that minimize the mapping errors in 
the minimum mean square error sense to the vectors a]. Then the criterion is 

J 


H 

1=1 j=i 


(W^x] 






The weight vector matrix W that minimizes this criterion is given by 

W = S^'^B 

where 


and 


M 


Ni 


S = E E 
i=l j=l 


.T 




N: 


' 1..1 


M ”i 

B = E E 

i=l j=l ^ ^ 


J 
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Suppose a pattern from class is left. Then the v/eight vector matrix 
W{xj^) is related to W as 


W(X^) 


= W + 


' - ’ - 


1 




T- 

T,. k k 


where 




6. INCORPORATION OF SPATIAL INFORMATION 

It is proposed to incorporate the spatial information into the classification 
through transition probabilities. The dependencies in adjacent pixels are 
modeled as follows. If I and J are neighboring pixels, 

P(I = w-lJ = w.) = (1 - e)P(I - uj.) + 3 
and 

P(I = - (1 - e)P{I = 

v/here the parameter e controls the dependencies between neighboring pixels. 
The 0 = 1 represents the complete dependence, and 6 =0 represents the com- 
plete independence. The following two models that consider spatial informa- 
tion will be investigated. The posteriori probabilities for use with these 
algorithms will be estimated from the outputs of linear classifiers. 


6.1 TViO-DIMENSIOiNAL SPATIALLY UNIFORM CONTEXT 

Consider a neighborhood of 9 pixels shown in the following figure. 


8 

1 

2 

7 

0 

3 

6 

5 
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Figure 1 Illustration of 3x3 neighborhood. 
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Suppose that the pixel 0 is under consideration and that pixels 1 through 8 
are its neighbors. Decide the pixel Xq f u = ?. that maximizes 


P{Xq|oj = A)P{ui = i) 


8 - 

n [0p(X. jo) 
i=l ^ 


M 


= i) + 0 - 0) E P(X.|<- = j)P(u, = j)] 
j=l 


6.2 SEQUENTIAL OR MARKOVIAN DEPENDENCE 

This section considers the sequential Markovian dependence between neighboring 
pixels with the transition probabilities described in section 6.1 in terms of 
parameter e. Sequential look-before and look-ahead type of context will be 
used with the classifier: 


X,, 

Xg. 

X3, 

• • • 

^n-1 ’ 

x„. 

^n+r 

t • » 


“1 

0)2 

0J3 


‘^n-l 

. 

“n+1 




Figure 2.- Illustration of pixels and labels on scan line. 


Suppofa we want to find the label of pattern X|.|, using look-before and 
look-ahead type of context. The ’posteriori probabilities of using the 
context are given by 


P(U| “ , • •• , t 


^ ^ ^pK- 1 ^ Vll] 


E PC^n^“n ° J) (M - 8)P(u„ “ j) + 9p(>Jn.i ' 


P(uip = MX, 


p{w„ - kix,. X^} 


H 1 

^ P^yil“n+1 " + 2PtVl!w ' 


"'•y 

(1 

- 0 ) E “ j)p(vi ' ■*> * ®p(yiivi ° •'1 

J “ 



This sequential algorithm will be applied to find the label of a center pixel 
0 in a 3s3 neighborhood as follows. 



Figure 3.- Sequential contextual algorithm for a 3x3 neighborhood. 

Pixels 8, 1, and 2 will be used to find the posteriori probabilities for 

pixel 1 and similarly for pixels 7, 0, 3 and for pixels 6, 5, 4. Finally, 

pixels 1, 0, 5 will be used to find the label of the pixel under consideration, 0. 

7. CLASSIFIER DESIGN WITH IMPERFECT LABELS 

The techniques for handling imperfections in the labels developed in references 8 
and 9 will be implemented and evaluated. To conduct the investigations proposed' 
in this plan, the required computer time will be at least 45 computer processing 
unit hours. This time allows for generating the necessary data files. It is 
estimated that this task requires two people over a period of 6 months. Soft- 
ware support is required for developing necessary software. 
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